SYSTEM, METHOD, AND PROGRAM FOR ESTIMATING GENE 
EXPRESSION STATE, AND RECORDING MEDIUM THEREFOR 



Background of the Invention: 
[0001] The present invention relates to a method and system for 

statistical analysis of cDNA microarray data using two different fluorescence 
dyes, and a recording medium for the same. In particular, the present 
invention relates to a system, method, and program for estimating the 
probability of gene expression in each channel, and a recording medium for the 
same. 

[0002] Currently, the study of genomics is expanding from structural 

analysis on individual genes to systematic functional analysis of genes. 
Experiments using cDNA (complementary DNA) microarrays capable of 
simultaneously quantifying the expression levels of a large number of genes are 
expected to be extremely effective in functional analysis of functionally unknown 
genes or whole genes. 

[0003] The objective of experiments using cDNA microarrays with two 

different fluorescence dyes is to detect the difference in gene expression level 
between two kinds of cells. The following gives a summary of a cDNA 
microarray configuration with two different fluorescence dyes. First, cDNAs of 
a large number of sets of genes are densely fixed on glass slides in arrays 
(microarrays) as reference probes. 

[0004] Next, mRNAs extracted from two kinds of different conditional 

samples, cell 1 and cell 2 (e.g., normal cell and cancer cell), are labeled 
respectively with fluorescence dyes different in wavelength from each other to 
synthesize target cDNAs. Then, these cDANs are mixed in equal proportions, 



and hybridized with the microarrayed cDNAs or reference probes. After this 
competitive hybridization, the glass slides are imaged using a scanner and 
fluorescence intensities are measured separately for each dye. The 
fluorescence dye with which the cell 1 is labeled and the fluorescence dye with 
which the cell 2 is labeled are read from channel 1 and channel 2, respectively, 
to obtain gene expression level data (microarray data). 

[0005] Thus, since the process of obtaining microarray data is so 

complicated as to require advanced experimental techniques, it is conceivable 
that several experimental errors could occur at each stage of the experiment. 
Therefore, in order to retrieve data to be truly biologically significant from the 
microarray data, analyzing expression level distributions and experimental 
errors presents a significant challenge to be solved. 

[0006] In regard to the expression level distributions, prior art 

document 1 (Newton et. al., 2001, Journal of Computational Biology, Vol. 8, pp. 
37-52) can be referred to, for example, in which Newton et. al. forms a 
hypothesis that proposes the use of the gamma distribution function to help 
analyze expression levels so as to consider statistical characteristics about the 
ratio of expression levels (the ratio of expression levels in channel 1 and 
channel 2). 

f(x) = pq> ( x - Hi | of ) + ( 1 - p) <p ( x - 1*2 I °i) 0) 

[0007] In regard to the observed expression level data, prior art 

document 2 (Lee et. al., 2000, Proceeding of the National Academy of Sciences, 
Vol. 97, No. 18, pp. 9834-9839) can be referred to, for example. Assuming the 
ability to separate true expression levels into two levels and the existence of 
accidental errors, Lee et. al. adopts a mixed normal distribution as shown in the 
following equation (1) to consider statistical characteristics about the expression 
level data: 

[0008] Here, x denotes (the logarithmic value of) a gene expression 

level such as fluorescence intensity obtained with a scanner or the like. 
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Further, the first term of the right hand side, <p (x - p x | of ), represents a 
normal distribution with average ^ and variance a\ when a gene is being 
expressed, the second term q> ( x - \x 2 \ o\) represents the density function 
of a normal distribution with average p 2 and variance a\ when no gene is 
being expressed, and p is a parameter representing the mixing ratio. [0009] 
In regard to the analysis of the experimental errors, there have been 
proposed several methods of removing systematic errors, so-called 
normalization methods. For example, when referring to prior art document 3 
(Chen et. al., 1997, Journal of Biomedical Optics, Vol. 2, pp. 364-374), Chen et. 
al. assumes that the median values of gene expression levels of two cells are 
equal to correct for the measured values obtained from channel 1 and channel 
2, respectively. Further, when referring to prior art document 4 (Dudoit et. al., 
2000, "Statistical methods for identifying differentially expressed genes in 
replicated cDNA microarray experiments," TechnicafReport #5782), prior art 
document 5 (Schuchhardt et. al., 2000, Nucleic Acids Research, Vol.28, No. 10), 
and prior art document 6 (Yang et. al., 2002, Nucleic Acids Research, Vol.30, 
No.4), Dudoit, Schuchhardt, and Yang consider that systematic errors are 
caused by different locations of spots on glass slides or different sensitivities of 
the two kinds of fluorescence dyes, and propose methods of removing the 
errors. 

[0010] The above-mentioned prior art problems are derived from the 

fact that the analytical results of microarray data lack reproducibility because of 
low precision and efficiency. It is considered that the cause is insufficient 
separation of microarray data into true signals, and systematic and 
measurement errors in the conventional analytical methods. Therefore, 
removal of systematic errors and evaluation of measurement errors are 
important issues. 

[001 1] In regard to the removal of systematic errors, a copending 

patent application entitled "Method and System for Correction of cDNA 
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Microarray Data, and Recoding Medium Therefor" has been filed separately. 
Therefore, the present invention assumes that systematic errors are already 
removed from the microarray data. 

[0012] The conventional analytical methods using microarray data deal 

with only the ratio (the difference of logarithmic values) of gene expression 
levels of two channels, that is, they do not deal with the gene expression level 
of each channel, for the reason that quantitative uniformity of cDNA in each spot 
is not ensured. Therefore, the conventional analytical methods results in 
insufficient separation between true signals related to gene expression state 
and measurement errors. 

Summary of the Invention: 
[0013] It is an object of the present invention to provide a method and 

system for separating true signals related to gene expression from 
measurement errors to increase the precision and efficiency of analysis using 
microarray data, and further estimating the probability of gene expression in 
each channel. 

[0014] A gene expression state estimating system of the present 

invention includes an input device for inputting microarray data, a 
program-controlled data analyzer, and an output device. The data analyzer 
has parameter estimating means for estimating distributed parameters for each 
component of a mixed normal distribution and a mixing ratio parameter using 
gene expression level data given from the input device, and posterior probability 
calculating means for calculating the posterior probabilities of gene expression 
in each channel using each of the estimated parameters. The calculated 
posterior probabilities are outputted to the output device. 

[[0015] The adoption of such a configuration to estimate the state of 
gene expression can attain the object of the present invention. 



Brief Description of the Drawings: 

[0016] Fig. 1 is a schematic graph of a mathematical model using S-D 

plots according to the present invention. 

[0017] Fig. 2 is a block diagram showing the structure of a first 

embodiment according to the present invention. 

[0018] Fig. 3 is a flowchart showing the operation of the first 

embodiment according to the present invention. 

[0019] Fig. 4 is a block diagram showing the structure of a second 

embodiment according to the present invention. 

[0020] Fig. 5 is a cumulative distribution graph showing an estimated 

normal distribution of gene expression level data near V=0. 

[0021] Fig. 6 is a graph showing the density function of the estimated 

normal distribution of the gene expression level data near V=0. 

[0022] Fig. 7 is a graph showing S-D plots of gene expression level 

data. 

Description of the Preferred Embodiments: 
[0023] A mathematical model of gene expression level data obtained 

from a microarray according to the present invention will first be described. If 
X denotes the gene expression level of ceil 1 obtained with channel 1 and Y 
denotes the gene expression level of cell 2 obtained with channel 2, then 
respective gene expression level data are shown in the following equation (2) 

X = TiCX + P + £i 

y = T 2 ot + p + e 2 (2) 
where X and Y denote amounts subjected to adequate transformations of 
observed values including logarithmic transformation or power transformation 
and linear transformation. 

[0024] Here, x1 andx2 take either 1 or 0, which represents the 

presence or absence (ON/OFF) of true gene expression in each cell. Further, 



a denotes the amount of mRNA produced when the gene is ON-state and a 
random variable of gene expression defined by the state of the spot, (3 denotes 
a common measurement error between the channel 1 and the channel 2, and £ 
denotes a measurement error independent between the channels. Note that 
each distribution of random variables follows the following equation (3) 



log a ~ N 



A 2 



v 



e-j ~ N (o, ol), j = 1, 2 (3) 
p ~ N (o, o|) 

[0025] Here, N(|j, a 2 ) demotes a one-dimensional normal distribution 

with average u and variance o 2 . Further, a, p, and e are all independent. In 
this mathematical model, when a gene is being expressed (ON-state), the true 
expression level is a random variable that takes on nonnegative values, while 
when it is not being expressed (OFF-state), only simple measurement errors 
are considered to be observed. Further, referring to the prior art document 6, 
an S-D transformation is performed as a modification from the M-A 
transformation adopted by Yang Y.H. et. al. as shown in the following equation 
(4): 

U = X + Y, 

V = X - Y (4) 

[0026] In other words, the transformation is made assuming that U and 

V are the sum and difference of gene expression levels of two channels, 
respectively. A schematic graph of this S-D transformation model is shown in 
Fig. 1 . Note that this plot is called the S-D plot. In Fig. 1 , g 0 o represents a 
simultaneous distribution when no gene is being expressed in both cells, g-i 0 
represents a simultaneous distribution when a gene is being expressed in cell 1 
but not in cell 2, g 0 i represents a simultaneous distribution when a gene is being 
expressed in cell 2 but not in cell 1 , and gn represents a simultaneous 



distribution when any gene are being expressed in both cells. The density 
function of the distribution g 0 o is shown in the following equation (5) 

g 0 o(u, v | 0) = cp (u | 4ag + 2a J) 9 (v | 2a*) (5 ) 

[0027] Here, <j>(u|a 2 ) is the density function of a one-dimensional 

normal distribution with average 0 and variance a 2 . The density function of the 
distribution g 10 is shown in the following equation (6): 
gio( u ' v I e ) 



Coo 



+ Xz 

2 



4ap + 1o\ 
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v — jae 



+ Az o 

2 | 2a- 

G 



cp (z | l)dz 



= <p 2 (u - ji, v - p | E 10 ) 



(6) 



[0028] Here, <|> 2 (u, v|Z) is the density function of a two-dimensional 

normal distribution with average vector 0 and variance-covariance matrix S, 
andZio is a 2><2 variance-covariance matrix, which is shown in the following 
equation (7) 



s io - 



f 2 

(e X - 1) + 4c 2 , + 2<yJ 
2,_X 2 



p 2 (e X - 1) 



(7) 



li^e" - 1) Ti'(e^ - 1) + 2o% 

[0029] The density function of the distribution g 0 i is shown in the following 
equation (8): 



+ Xz 



g 0 i(u, v | e) 

r 

u - pe - j 
= cp 2 (u - p, v - p | E 01 ) 



2 | 4og + 2a* 



v - *pe 



| 2a — 



q> (z | l)dz 



(8) 



(8) 



[0030] Here, E 0 i is a 2x2 variance-covariance matrix, which is shown in 

the following equation (9): 



8 



'01 



f 2 

p 2 (e X - 1) + 4a| + 2a\ 
- yx 2 (e X * - 1) 



.2/ X 



- p (e" - 1) 



p 2 (e x - 1) + 2o^ 



(9) 



[0031] The density function of the distribution g-n is shown in the 

following equation (10) 



g xl (u # v | e) = <p (v | 2ol)£™ <p 



u - 



+ Xz 

2 



4op + 2a| 



<p(z | 1) dz 



2/_X" 



= <p (u - 2yi | 4p (e 



1) + 4oj? + 2a|) <p (v | 2a|) 



(10) 



[0032] Based on the above-mentioned distributions, posterior 

probabilities of gene expression in cell 1 and cell 2 are shown in the following 
equations (1 1 ) and (1 2) 

Pr hl = 1 | p. 9) = Pi°9l0<"- v I »> + BugU<S; v I 8) 

f (u, v | p, 9) 

Pr(x 2 = 1 | p, 6) = P01901<"- v I 9 > + PllgllO. v I !> (12) 

f (u, v I p, 9) 

where f(u, v|p, 0) is given by the following equation (13) 

f( U/ v | p, 9) = J] p jk g jk (u # v | 6) (13) 

(j # k)e{0,l} 2 

[0033] Note that p=(p 0 o> Pio> Pol Pn) is a parameter representing the 

mixing ratio for each distribution. 

[0034] An embodiment of the present invention will next be described 

in detail with reference to the accompanying drawings. Referring to Fig. 2, the 
first embodiment of the present invention is a system for estimating posterior 
probabilities of gene expression states in cell 1 and cell 2 through the process 
of formulating a mathematical model related to gene expression level data and 
the process of estimating unknown parameters by the application of the 



formulated mathematical model to the data analysis, and using the calculated 
estimates of parameters. The system includes an input device 1 such as a 
keyboard, a program-controlled data analyzer 2, and an output device 3 such as 
a display device or printer. 

[0035] The data analyzer 2 is provided with distributed parameter 

estimating means 21, mixing ratio parameter estimating means 22, and 
posterior probability calculating means 23. The distributed parameter 
estimating means 21 estimates distributed parameters for each component in a 
mixed normal distribution using gene expression level data from the input 
device 1 . The estimated distributed parameters are sent to the mixing ratio 
parameter estimating means 22 and the posterior probability calculating means 
23. The mixing ratio parameter estimating means 22 estimates a mixing ratio 
parameter for the mixed normal distribution by a conditional maximum likelihood 
method using the gene expression level data from the input device 1 and the 
distributed parameters for each component given from the distributed parameter 
estimating means 21 . The estimated mixing ratio parameter is sent to the 
posterior probability calculating means 23. The posterior probability calculating 
means 23 calculates the posterior probability of a gene expression state in each 
channel using the gene expression level data from the input device 1 , the 
distributed parameters for each component given from the distributed parameter 
estimating means 21, and the mixing ratio parameter from the mixing ratio 
parameter estimating means 22. The calculated posterior probability is sent to 
the output device 3. 

[0036] Referring next to Figs. 2 and 3, the process of formulating a 

mathematical model related to gene expression level data and the process of 
estimating unknown parameters by the application of the formulated 
mathematical model to the data analysis will be described in detail. Gene 
expression level data { (u±, v± ) | i - l, . • . , n} given from the input device 
1 is sent to the distributed parameter estimating means 21 and the mixing ratio 
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parameter estimating means 22. The distributed parameter estimating means 
21 estimates po, do, Pi, and ai by applying the mixed normal distribution of 
two components, as shown in the following equation (14), to data {u ± | | v± | < 
c M / i=i/ . . • ,n} on the sum of the amounts of expression of genes near V=0 
where c M denotes the median value of the absolute difference I v ± | (i = 
l, . . . ,n) of gene expression levels (step A1 in Fig. 3) 

<l - 5) 9 (u - p 0 I °l) + 5<P (u - p x | o\) (14) 
where <p(* | a 2 ) is the density function of a one-dimensional normal 
distribution with average 0 and variance o 2 , (p 0 , a 2 ) and (\i lf a 2 ) are 
average and variance parameters for first and second components, 
respectively, and £ is the mixing ratio, with the assumption that 

Po < Pi, °o > °' °i > °' 0 < 5 < 1 is satisfied. 

[0037] Next, the distributed parameter estimating means 21 uses the 

estimated f, p 0 , a 2 , Pi/ a 2 to estimate p, a 2 , ap, X according to the 
following equations (15), (16), (17), and (18) (step A2) 
p = (P! - p 0 >/2 (15) 



-2 



2 ll N o|| ieNo 



$1 = \ ol -|aj (17) 




X = Jlog 1 + (18) 



where N 0 denotes an index set of data values that satisfies 

i e {i | u ± < p 0 } and ||n 0 || denotes the number of elements. 

[0038] Next, the mixing ratio parameter estimating means 22 estimates 

a mixing ratio parameter p=(poo, P10. P01, Pn) by a conditional maximum 

likelihood method using an estimate e = (p, X, a e , o p ) of each parameter 

given from the distributed parameter estimating means 21 by applying a 
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two-variable mixed normal distribution of four components shown in the 
following equation (19) to the gene expression level data { (u ± , vi) [ 1=1,. ..,n} 
given from the input device 1 (step A3). 

Poogoo( u ' v | 8) + Pi 0 gio< u ' v | 8) + p 0 igoi( u ' v | 8) + p u gn(u, v | 8) 

= Poo<P( u I + 2ag)<p(v | 2a^) 4- p 10 <P2^ - P# v - | E 10 ) 

+ Poi<P2< u - iir v + p | S 01 ) + Pn<p(u - 2p | 4p (e - 1) 
+ 4a| + 2a|)<p(v | 2a 2 ) 

[0039] Here, it is assumed that the above equation satisfies the 

relationships shown in the following equation (20) (where S 10 is a 2x2 
variance-covariance matrix derived from the equation (7)) and the following 

yv 

equation (21 ) (where 2oi is a 2x2 variance-covariance matrix derived from 
the equation (9)). 



s io - 



{T(e A - 1) + 4a| + 2a| jT (e A - 1) 

p 2 (e x2 - 1) £ 2 (e x2 - 1) + 2a 2 



(20) 



s oi - 



V s (e* 2 - 1) + 43^ + 2S 2 - fi 2 (e x2 - 1) 

- £ 2 (e X - 1) £ 2 (e X - 1) + 2a 2 



(21) 



[0040] The process of estimating posterior probabilities of gene 

expression states in cell 1 and cell 2 using the calculated estimates of 
parameters will next be described. 

[0041] The posterior probability calculating means 23 can describe the 

posterior probabilities of gene expression state in each cell for each pair (u, v) 
of the gene expression level data given from the input device 1 using the 
estimates e = (p, X, b\, b\) and p = (p 0 o* Pio,Poi,Pii> of each 
parameter given from the distributed parameter estimating means 21 and the 
mixing ratio parameter estimating means 22. 

[0042] In other words, the posterior probabilities indicating that any 

gene expression is ON-state in cell 1 and cell 2 can be calculated from the 
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following equations (22) and (23) (step A4). 



Pr (x x = 1 | p, 0) = 



PiogiQ( u ^ v I 9) + Piign(U/ v 1 9) 
f (u, v | p, 0) 



(22) 



Pr (x 2 = 1 | p, 0) = 



Poigoi( u / y 1 0) + Pngn(^/ y I 9) 

f(u, V | p, 0) 



(23) 



[0043] 



It is then judged whether calculations of posterior probabilities 



indicating that any gene expression is ON-state have been made for all the 
pairs (u, v) of the gene expression level data (step A5). When all the 
calculations have been completed, the process is ended, while when all the 
calculations have not been completed yet, the posterior probability related to the 
next gene is calculated. 

[0044] The calculated posterior probabilities of gene expression in 

each channel are sent to the output device 3. The output device 3 displays or 
prints out the posterior probabilities of gene expression in each channel in the 
form of a graph. 

[0045] The following describes the effects of the embodiment. In the 

embodiment, a mathematical model in which the concept of gene 
expression/nonexpression is introduced is constructed to separate true signals 
from experimental errors. Further, the use of data on the sum and difference 
of gene expression levels in two channels makes it easy to obtain information 
on the sensitivity of microarray data to fluorescence intensities in each channel, 
allowing more accurate extraction of the magnitude of experimental errors. 
Furthermore, a two-dimensional simultaneous distribution is described for these 
sum and difference data. It allows high-precision estimation of posterior 
probabilities of gene expression in each channel. 

[0046] In addition, the posterior probability indicating an event of 

differential expression between cell 1 and cell 2 (mismatched ON-OFF state) 
(step 4 in Fig. 3) is calculated by the following equation (24). 



Pr (x x * x 2 | p, 6) = 



Pipgiote/ v I e) + Poigoi(u/ v 1 Q) 

f (u, v | p, 9) 



(24) 



[0047] 



Thus, the embodiment has the advantage of detecting 



candidate genes that are likely to reveal differential expression in cell 1 and cell 
2. 



described in detail with reference to Fig. 4. Like the first embodiment, the 
second embodiment of the present invention includes the input device, data 
analyzer, and the output device. The second embodiment also includes a 
recording medium 4 with a data analyzing program recorded on it. The 
recording medium 4 may be either portable or fixed type, such as a magnetic 
disk, semiconductor memory, CD-ROM, or any other recording medium. 
Alternatively, a computer program capable of executing the method of the 
present invention may be stored in a memory device of a computer connected 
to a network so that it can be transferred to another computer through the 
network. The form of the medium that provides a computer program executing 
the algorithm is a distributable as a medium readable in a variety of computer 
formats, and is not limited to a specific type. 

[0049] The data analyzing program is read from the recording medium 

4 into a data analyzer 5 to control the operation of the data analyzer 5 to 
execute processing on data files inputted from the input device 1 in the same 
manner as the data analyzer 2 does in the first embodiment. 

[0050] The following specifically describes the embodiments of the 
present invention. Data used as an example is obtained from an experiment 
for comparing the states of gene expression of two different types of cancer 
cells (cell 1 and cell 2). 

[0051] The test is conducted on expression patterns of 48 grids on one 

chip, 441 (21x21) spots per grid, a total of 21168 genes. 

[0052] Figs. 5 and 6 show estimation results of each of the distributions 



[0048] 



A second embodiment of the present invention will next be 
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U of the sum of expression levels when cell 1 and cell 2 both show OFF or ON 
state of gene expression (V=0), a mixed normal distribution, and a 
single-peaked normal distribution for contrast purposes. The following table 1 
shows estimation results of distributed parameters for each component (results 
of step A1 ). 

[Table 1] 

Resale of mWXFtt (Ver 0^8) 



^Jaasc of .Data Set to bs analysed « nl£&blt£&l 
Name ofTajrgefc \%rmte^SJMl 

Sample aba « 14786 
Drafcijsal Afetae fo? Coa^»i^ft<se « . lOOQOlS'QS 
£tenM>£orcs fee Oonwgeiace - SO 

Job f ermin&ti&ia $ba:us = Nacmaliy %xmmateil 

Mean SD RftteOO 

Single CompomnV 3,03?3 4.3&60 1*00,00 

LogJUkeliJicyoi &** Stagta GtogpoiuKit - -42iS£$. 

for Two C&nkp<ms2$& M£xfc4 
Lc& of likelihood Ratio £$alfo&c& * 



[0053] Fig. 5 shows cumulative distribution functions and Fig. 6 shows 

density functions, in which the thin solid line indicates the estimation results of 
an assumed mixed normal distribution, the chain double-dashed line indicates 
the estimation results of its first component (OFF-OFF), the bold solid line 
indicates the estimation results of its second component (ON-ON), and the 
dashed line indicates the estimation results of an assumed single-peaked 
normal distribution. 

[0054] The long and short dashed line in Fig. 5 indicates an empirical 

cumulative distribution function based on observed data, showing that it well 
follows the curve of the mixed normal distribution (thin solid line) in which 
observed values are estimated. 

[0055] In Fig. 6, the asterisk marks (which are replaced with the 

following hatching patterns (1) to (5) in the range of gene expression levels from 



about 0 to 30 because, though the asterisk marks can be discernible at both 
ends, they are densely overlapped within the range) indicates observed data. 
The hatching patterns ((1) to (5)) represent the magnitude of posterior 
probability values belonging to the first component. In other words, the solidly 
shaded area (1) indicates the range of posterior probabilities from 0 to 0.2, the 
hatching area (2) from 0.2 to 0.4, the hatching area (3) from 0.4 to 0.6, the 
hatching area (4) from 0.6 to 0.8, and the hatching area (5) from 0.8 to 1 .0. 

[0056] Fig. 7 shows an S-D plot of gene expression level data, in which 

the abscissa indicates the sum of logarithmic values of gene expression levels 
of cell 1 and cell 2, and the ordinate indicates the different between the 
logarithmic values. In Fig. 7, the hatching patterns represent the magnitude of 
posterior probabilities indicating mismatched expression states (ON-OFF or 
OFF-ON) between cell 1 and cell 2. In other words, the solidly shaded area (1 ) 
indicates the range of posterior probabilities from 0 to 0.2, the hatching area (2) 
from 0.2 to 0.4, the hatching area (3) from 0.4 to 0.6, the hatching area (4) from 
0.6 to 0.8, and the hatching area (5) from 0.8 to 1 .0. 

[0057] The following table (2) shows estimation results of distributed 

parameters by the conditional maximum likelihood method (results of steps A2 
and A3 in Fig. 3). A gene corresponds to each plotted spot in Fig. 7, and this 
makes it easy to narrow down gene candidates related to the difference 
between cell 1 and cell 2. 
[Table 2] 

RESULT OF HAS) V«r£a©S>& 

SIQM&_3p6tlail = .858 

LAMBDA= J&60 

SlG&A-fceta * .133 

5*11 s .561 

#10 = JQ04 

POJ = on 

POO* A1B 
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[0058] The first effect of the present invention is to achieve a 

separation between true signals related to gene expression and experimental 
errors by introducing the concept of gene expression/nonexpression into the 
gene expression level data obtained from microarrays to construct a 
mathematical model. 

[0059] The second effect of the present invention is to make it easy to 

obtain sensitivity information on the sensitivity of microarray data to 
fluorescence intensities of two channels by transforming the gene expression 
level data obtained from microarrays into data on the sum and difference of the 
gene expression levels between two channels. It then makes it possible to 
visualize the magnitude of experimental errors. 

[0060] The third effect of the present invention is to enable estimation 

of posterior probability related to expression/nonexpression of each gene in 
each of the two channels by transforming the gene expression level data 
obtained from microarrays into data on the sum and difference of the gene 
expression levels between two channels to describe a two-dimensional 
simultaneous distribution of the sum and difference data. It then makes it 
possible to detect genes related to differences between cell 1 and cell 2 with 
high precision. 



